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Abstract 

cn . 

We consider asynchronous message-passing systems in which some links are timely and processes 
' may crash. Each run defines a timeliness graph among correct processes: (p, q) is an edge of the time- 

I liness graph if the link from p to g is timely (that is, there is bound on communication delays from p to 

^) ' q). The main goal of this paper is to approximate this timeliness graph by graphs having some properties 

(/3 I (such as being trees, rings,... ). Given a family S of graphs, for runs such that the timeliness graph 

^ O ^ ' contains at least one graph in S then using an extraction algorithm, each correct process has to converge 

to the same graph in S that is, in a precise sense, an approximation of the timeliness graph of the run. For 
PsJ ■ example, if the timeliness graph contains a ring, then using an extraction algorithm, all correct processes 

K*" I eventually converge to the same ring and in this ring all nodes will be correct processes and all links will 

be timely. 

We first present a general extraction algorithm and then a more specific extraction algorithm that is 
communication efficient {i.e., eventually all the messages of the extraction algorithm use only links of 
the extracted graph). 
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1 Introduction 



Designing fault-tolerant protocols for asynchronous systems is highly desirable but also highly complex. 
Some classical agreement problems such as consensus and reliable broadcast are well-known tools for 
I solving more sophisticated tasks in faulty environments (e.g., ifTTllTSl ). Roughly speaking, with consensus 

processes must reach a common decision on their inputs, and with reliable broadcast processes must deliver 
the same set of messages. 

It is well known that consensus cannot be solved in asynchronous systems with failures |[T4l . and sev- 
eral mechanisms were introduced to circumvent this impossibility result: randomization Q, partial syn- 
chrony II11II12II and (unreliable) failure detectors ||6l. 

Informally, a failure detector is a distributed oracle that gives (possibly incorrect) hints about the process 
crashes. Each process can access a local failure detector module that monitors the processes of the system 
and maintains a list of processes that are suspected of having crashed. 

Several classes of failure detectors have been introduced, e.g., V, S, $1, etc. Failure detectors classes can 
be compared by reduction algorithms, so for any given problem P, a natural question is "What is the weakest 
failure detector (class) that can solve P ?". This question has been extensively studied for several problems 
in systems with infinite process memory {e.g., uniform and non-uniform versions of consensus |[5l [T3l. 
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non-blocking atomic commit O, uniform reliable broadcast |[T][T9l, implementing an atomic register in a 
message-passing system 13, mutual exclusion ifTOl . boosting obstruction-freedom |[T6l . set consensus 11211 
I22I . etc.). This question, however, has not been as extensively studied in the context of systems with finite 
process memory. 

In this paper, we consider systems where processes have finite memory, processes can crash and links 
can lose messages (more precisely, links are fair lossy and FIFcQ). Such environments can be found in many 
systems, for example in sensor networks, sensors are typically equipped with small memories, they can crash 
when their batteries run out, and they can experience message losses if they use wireless communication. 

In such systems, we consider (the uniform versions of) reliable broadcast, consensus and repeated con- 
sensus. Our contribution is threefold: First, we establish that the weakest failure detector for reliable broad- 
cast is — a failure detector that is almost as powerful than the perfect failure detector Next, we 
show that consensus can be solved using failure detector S. Finally, we prove that is the weakest failure 
detector for repeated consensus. Since S is strictly weaker than , in some precise sense these results 
imply that, in the systems that we consider here, consensus is easier to solve than reliable broadcast, and 
reliable broadcast is as difficult to solve as repeated consensus. 

The above results are somewhat surprising because, when processes have infinite memory, reliable 
broadcast is easier to solve than consensu^ and repeated consensus is not more difficult to solve than 
consensus. 

Roadmap. The rest of the paper is organized as follows: In the next section, we present the model con- 
sidered in this paper. In Section ??, we show that in case of process memory limitation and possibility of 
crashes, is necessary and sufficient to solve reliable broadcast. In Section ??, we show that consensus 
can be solved using a failire detector of type S in our systems. In Section ??, we show that V~ is necessary 
and sufficient to solve repeated consensus in this context. 

For space considerations, all the proofs are relegated to an optional appendix. 

2 Informal Model 

Graphs. We begin with some definitions and notations concerning graphs. For a directed graph G = 
{N, E), Node{G) and Edge{G) denote N and E, respectively. Given a graph G and a set M C Node{G), 
G[M] is the subgraph of G induced by M, i.e., G[M] is the graph {M,Edge{G)[M]) where {p,q) G 
Edge{G)[M] if and only if g G M and (p, q) G Edge{G). 

The tuple {X, y) is a directed cut {dicut for short) of G if and only if X and Y define a partition of 
Node{G) and there is no directed edge {y, x) G Edge{G) such that x G X and y ^ Y. We say that G' 
is a dicut reduction from G if there exists a dicut {X, Y) of G such that G' = G[X]. A set S of graphs is 
dicut-closed if and only if it is closed under dicut reduction, namely if G G S then all the graphs obtained 
by a dicut-reduction of G are in S. 

Processes and Links. We consider distributed systems composed of n processes which communicate by 
message-passing through directed links. We denote the set of processes by 11 = {pi, We assume 

that the communication graph is complete, i.e., for each pair of distinct processes {p, q), there is a directed 
link from p to q. 

' The FIFO assumption is necessary because, from the results in 1201 . if lossy links are not FIFO, reliable broadcast requires 
unbounded message headers. 

Note that V C and is unrealistic according to the definition in 

^With infinite memory and fair lossy links, (uniform) reliable broadcast can be solved using Q j4|, and Q is strictly weaker than 
(E, Q.) which is necessary to solve consensus. 
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A process may fail by crashing, in which case it definitively stops its local algorithm. A process that 
never crashes is said to be correct, faulty otherwise. 

The (directed) links are reliable, i.e. every message sent through a link (p, q) is eventually received by 
if g is con^ect and if a message m from p is received by q, m is received by q at most once, and only if p 
previously sent m to q. 

The links being reliable, an implementation of the reliable broadcast ifTSl is possible. A reliable broad- 
cast is defined with two primitives: rbroadcast(m) and rdeliver(m). Informally, after a correct 
process p invokes rbroadcast(m), all con^ect processes eventually rdeliver(m); after a faulty process 
p invokes rbroadcast(n?), either all correct processes eventually rdeliver(m) or correct processes 
never rdeliver(m). 

Timeliness. To simplify the presentation, we assume the existence of a discrete global clock. This is 
merely a fictional device: the processes do not have access to it. We take the range T of the clock's ticks to 
be the set of natural numbers. 

We assume that every correct process p is timely, i.e., there is a lower and an upper bound on the 
execution rate of p. Correct processes also have clocks that are not necessarily synchronized but we assume 
that they can accurately measure intervals of time. 

A link (p, q) is timely if there is an unknown bound 5 such that no message sent hy pto q at time t may 
be received by q after time t + 5. 

A timeliness graph is simply a directed graph whose set of nodes are a subset of IT. The timeliness graph 
represents the timeliness properties of the links. Intuitively, for timeliness graph G, Node{G) is the set of 
correct processes and {p, q) is in Edge{G) if and only if the link (p, q) is timely. 

Runs. An algorithm A consists of n deterministic (infinite) automata, one for each process; the automaton 
for process p is denoted A{p). The execution of an algorithm A proceeds as a sequence of process steps. 
Each process performs its steps atomically. During a step, a process may send and/or receive some messages 
and changes its state. 

A run r of algorithm ^ is a tuple r = (T, /, E, S) where T is a timeliness graph, / is the initial state 
of the processes in 11, E is an infinite sequence of steps of A, and 5 is a list of increasing time values 
indicating when each step in E occun^ed. A run must satisfy usual properties concerning sending and 
receiving messages. Moreover, we assume that (1) all con^ect processes make an infinite number of steps: 
p G Node{G) if and only if p makes an infinite number of steps in E and (2) the timeliness of links is 
deduced from the timeliness graph: (p, q) € Edge{G) if and only if the link (p, q) is timely in E. 

In the following for run r = (T, /, E, S), T{r) denotes T the timeliness graph of r, and Correct{r) 
is the set of correct processes for the run r, namely, Correct{r) = Node{T{r)). Note that by definition, 
(p, q) is a timely link if and only if (p, q) e Edge{T). 

Remark that in the definition given here a link may be timely even if no message is sent on the link. 
If link (p, q) is FIFO {i.e., messages from p to g are received in the order they are sent) and p regularly 
sends messages to q, then the timeliness of these messages implies the timeliness of the link itself. So in the 
following we always assume that links are FIFO. 

2.1 Some Systems 

We say that timeliness graph G is compatible with timeliness graph G' if and only if (1) Node{G) = 
Node{G') and (2) Edge{G) C Edge{G'). By extension, timeliness graph G is compatible with run r if G 
is compatible with T(r), the timeliness graph of r. Hence, timeliness graph G is compatible with run r if 
Node{G) is the set of correct processes in r and if (p, q) is an edge of G then (p, q) is timely in r. 
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A system X is defined as a set of timeliness graplis. The set of runs of system X denoted R{X) is the 
set of all runs r such that there exists a timeliness graph G in X compatible with r. 
Below, we define the systems considered in this paper: 

• ASyMC is the set of all timeliness graphs G such that Edge{G) = 0. In ASyMC there is no 
timeliness assumption about links and R{ASyMC) is the set of all runs in an asynchronous system. 

• COMVCETS is the set of all complete graphs whose nodes are the subsets of 11. 

• STATZ is the set of all timeliness graphs with a source, i.e. , G € STAIZ if and only if Node{G) C 11 
and there exists po € Node{G) (the center of the star or the source) such that Edge{G) = {{po, (?) |(? € 
Node{G) \ {po}}- Clearly a run r is in R{STA1Z) if and only if there is at least one source in r. 

• T'7?.(£'(£' is the set of all timeliness graphs G that are rooted directed trees, /.e., |£'(i5e(G) I = \Node{G)\ — 
1 and there exists po in Node{G) such that Vg € Node{G), there is a directed path of G from pQ to q. 
Clearly a run r is in R{TTZ££) if and only if there is at least one timely path from a correct process 
to all correct processes. 

• IZXMQ is the set of all timeliness graphs G such that G is a directed cycle (a ring). Clearly a run r is 
in R{TiZJ\fQ) if and only if there is a timely (directed) cycle over all correct processes. 

• SC is the set of all timeliness graphs that are strongly connected. Clearly, a run r is in R{SC) if and 
only if there exists a (directed) timely path between each pair of distinct con^ect processes. 

• BZC is the set of all timeliness graphs G such that for all p, q ^ Node{G), there exist at least two 
distinct paths from p to q. BZC corresponds to the set of 2-strongly-connected graphs. Clearly, a run r 
is in R{BZC) if and only if there exists at least two distinct timely paths between each pair of distinct 
correct processes. 

• VAZTZ is the set of all timeliness graphs G such that Edge{G) = {{po,Pi), {pi,Po)} with po,pi G 
Node{G) and pi / pQ. Cleai^ly, a run r is in R{VAZ1Z) if and only if there exists two distinct con^ect 
processes pq and pi such that {pq,Pi) and {pi,pid) are timely hnks. 

3 Extraction Algorithms 

Given a system X, the goal of an extraction algorithm is to ensure that in each run r in X, all correct 
processes eventually agree on the same element of X and that this element is, in some precise sense, an 
approximation of the timeliness graph of run r. 

For example, in IZZMQ, all processes have to eventually agree on some ring and this ring has to be 
compatible with the timeliness graph of the run. In particular this ring contains all the correct processes. 
However, the compatibility relation may be too strong: In many systems, it is not possible to distinguish 
between a crashed process and a correct one, so the graph G on which the processes eventually agree may 
contain crashed processes and then the graph is not exactly compatible with the run. Then we weaken the 
compatibility and impose only that the subgraph of G induced by the set of correct processes of the run is a 
dicut reduction of the timeliness graph of the run. 

We now formally define what an extraction algorithm is. First, in such an algorithm, every process p 
maintains a local variable Gp which contains a timeliness graph. Then, we say that an algorithm extracts a 
timeliness graph in X if and only if for every run r m X there is a timeliness graph G (called the extracted 
graph) such that: 

• Convergence: for all correct processes p there is a time t after which Gp = G 
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• Compatibility: G[Correct{r)] is compatible with T{r) 

• Closure: G[Correct{r)] is a dicut reduction of G or is equal to G 

• Validity: G is in A" 

Remark that for all systems that contain ASyMC there is a trivial extraction algorithm: for each run 
processes extract the graph G such that Node{G) = 11 and Edge{G) = 0. 

A more constrained version of the extraction problem is the following: an algorithm A extracts exactly 
timeliness graphs in X if for every run r in system X, the extracted graph G is compatible with T{r). In 
this case, all correct processes eventually know the exact set of correct processes: it is the set of nodes of 
the extracted graph. 

Some Results about Extraction Algorithms. First we show that an extraction algorithm may help to 
route messages using only timely links: 

Lemma 3.1 Let G be a graph extracted from run r, if (p, q) is in Edge{G) and q is a correct process then 
p is correct. 

Proof. By contradiction, assume that p is not correct, then {Correct{r), Node{G) — Gorrect{r)) is not a 
dicut because {p,q) € Edge{G), p G Node{G) — Correct{r) and q G Correct{r), which contradicts the 
Closure property. □ 

From this lemma and the Compatibility property, we deduce directly: 

Proposition 3.2 If {p = pq, . . . , pi, . . . , q = pm) is a path in the extracted graph and p and q are correct 
processes, then for every i such that < i < m the link (pi,pi^i) is timely and process pi is correct. 

From a practical point of view, this proposition shows that the extracted graph may be used to route 
messages between processes using only timely links: the route from p to g is a path in the extracted graph 
(if any). All intermediate nodes are correct processes and agree on the extracted graph and then on the path. 

For example with TIZSS, the tree extracted by the algorithm enables to route messages from the root of 
the tree to any other processes and the routing uses only timely links. 

Generally, the main goal of the extraction algorithm is not only to extract a graph G in X but also to 
ensure that G[Gorrect{r)\ is in X (even if the processes do not know the set of correct processes). In 
particular, this property is ensured if X is dicut-closed: the Closure property implies that G[Gorrect{r)] is 
in X. 

Among the systems we consider, only system VAIIZ is not dicut-closed: H = ({a;},0) is a dicut 
reduction of G = {{x,y,z},{{y,z),{z,y)}) but is not in VAZTZ. It is easy to verify that every other 
previously introduced system is dicut-closed. For these systems we obtain: 

Proposition 3.3 Consider any extraction algorithm for the system X. 

• If X = STATZ, then the center of the extracted star is a correct process. 

• If X = TTZSS, then the root of the extracted tree is a correct process. 

• IfXe {SC, COMVCSTS, niMQ, BIC}, then the extraction is exact. 
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Proof. For STAR- and TTZ88, all the dicut reductions of the extracted graph contain at least respectively 
the center and the root, then the restriction of the extracted graph contains at least these nodes, proving that 
they are correct processes. 

There is no dicut for a strongly connected graph. Hence in SC, there is no dicut reduction then by the 
Closure property the subgraph induced by the set of correct processes of the extracted graph is the extracted 
graph itself. COMVC£T£, TZXMQ, and BXC are particular cases of systems only composed of strongly 
connected timeliness graphs. □ 

An immediate consequence of Proposition 13. 3 1 is that any extraction algorithm gives an implementation 
of eventual leader election (failure detector J]) for systems STAR, and TTZ££ as well as an implementation 
of failure detector ()V for systems COMVC£T£, TZIMQ, SC and BIC. 

Due to the lack of space, the proofs of the two following propositions have been moved in the appendix. 
In the first proposition we show that extraction is not always possible. Actually, in the proof we exhibit 
some non dicut-closed systems, namely VAZTZ, where no extraction algorithm can be implemented. 

Proposition 3.4 There exist some systems X for which there is no extraction algorithm. 

In the next section we show that for all dicut-closed systems there is an extraction algorithm. For systems 
like STAR, TTZ££ and VALIZ, there exists no exact extraction algorithm. 

Proposition 3.5 There exist some systems X for which there is an extraction algorithm and there is no exact 
extraction algorithm. 

4 An Extraction Algorithm 

The aim of this section is to show that the dicut-closed property of a system is sufficient to solve the extrac- 
tion problem. To that end, we propose in Figured] an extraction algorithm, called A{X), for dicut-closed 
systems X. 

The basic idea of Algorithm A{X) is to make processes select a graph that is compatible with the 
timeliness graph of the run. For this, each process maintains for each graph x in A' an accusation counter 
Acc[x\. This counter infinitely grows if some correct process is not in x or if some directed edge of x is not 
timely. Then, ^cc[2;] is bounded if and only if x contains all correct processes and all timely links between 
pairs of correct processes. 

We implement accusation counters as follows. A process regularly blames all the graphs in X in which 
it is not a node: it increments the accusation counters of all these graphs. Note that if the process is correct 
this accusation is justified and if the process is not con^ect, after some time, the process being dead stops 
to increment the accusation counters. Moreover, each process regularly sends on its outgoing links alive 
messages. Each process maintains an estimate of the communication delays for each incoming Unk (A[g] 
for the incoming link {q,p)). If it does not receive alive messages within these estimates on some incoming 
link it blames all timeliness graphs in X containing this link (i.e., increments the accusation counters for 
these graphs). As the estimate of the communication delay may be too short, each time it is exceeded the 
process increases it for the link. In this way, if the link is timely, at some time the estimate will be greater 
than the bound on communication delay. 

The accusation counters are broadcast by reliable broadcasts. Each time a process receives a new value 
of accusation counter it updates its own accusation counter to the maximum of the received values and its 
current values. Hence, if some timely graph stops to be blamed then all correct processes eventually agree 
on the value of its accusation counter. 
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By selecting the graph G with the lowest accusation value (to break ties, we assume a total order among 
the graphs of A") if any, correct processes eventually agree on the same timeliness graph of A", moreover we 
can prove that this graph contains (1) all the correct processes, and (2) all edges between correct processes 
are timely links. As a consequence, the Convergence, the Compatibility and the Validity properties of 
the extraction algorithm are ensured. Nevertheless, this graph can also contain faulty processes and edges 
between correct and faulty processes. 

Consider now the Closure property. If G contains only correct processes then the Closure property 
is trivially satisfied. Otherwise, G contains Correct{r) and a set F of faulty processes. In this case, 
{Correct{r), F) is a dicut reduction of G: Indeed if there is an edge in G from a faulty process g to a 
correct process p, eventually the process p stops to receive messages from q and the accusation counter of 
G grows infinitively often. Hence, in all cases, the Closure property is satisfied. 

Hence, if X is dicut-closed. Algorithm A{?(!) extracts a graph in X. Moreover from Proposition 13.31 if 
all the graphs of X are strongly connected then the algorithm exactly extracts a graph in X. 

In the algorithm, each process p uses local timers, one per process. The timer of p dedicated to q is 
set (by setting settimer(q') to a positive value) to a time interval rather than absolute time. The timer is 
decremented until it expires. When the timer expires timerexpire((7) becomes true. Note that a timer 
can be restarted before it expires. 

In the algorithm, we denote by -< the total order relation on X and by -<iex (see Line|2ll the total order 
relation defined as follows: Vx, y ^ X, \/cx, Cy G IN, {cx,x) ^i^x {cy,y) = [cx < Cy V {cx = Cy A x ^ y)]. 

Code for each process p 



1: Procedure updateExtractedGraphQ 

2: G <— X such that (Acc[x],a;) = min^j^^{{Acc[a:'],a;') such that a; ' S X} 

3: On initialization: 

4: for all a; e A" do Acc[x] <- 

5: for aU g e n \ {p} do 

6: A[q] ^ 1 

7: settimer(q) <— A[q] 

8: updateExtractedGraph{) 

9: start tasks 1 and 2 

10: taskl: 

1 1 : loop forever 

12: send(a/Mie) to every g G 11 \ {p} every K time 

13: rbroadcast (ACC,-L,p) every K time /* to accuse graphs that do not contain p */ 

14: task 2: 

15: upon receive (aiiiie) from g do 
16: settimer(<j) ^ A[f/] 

17: upon timerexpire(g) do 

18: rbroadcast(ACC, q.p) /* to accuse graphs that contain the link {q,p) */ 

19: A[q] ^ A[q] + 1 ' 

20: settimer(g) <- A[q] 

21: upon rdeliver(j4CCg,/i) do /* information from h */ 
22: for all a; G A' do 

23: if g =± then 

24: if h ^ Node[x) then Acc[x\ <- Acc[x] + 1 

25 : else 

26: if (g, h) G Edge{x) then Acc[x] <- Acclx] + 1 

27: updateExtractedGraphQ 



Figure 1: Algorithm A{X) extracts a graph in X 

A sketch of the correctness proof of A{X) is given below. In this sketch, we consider a run r of A{X) 
in dicut-closed system X. We will denote by war* the value of var of process p at time t. 
We first notice that all variables Accp[2;] are monotonically increasing: 



7 



Lemma 4.1 For all tunes t and t' such that t > t', for all processes p, for all graphs x in X, AcCp[x] > 
AcCp[x]. 

Let sup(AcCp[2;]) be the supremum of Acc* [x] for all t, we say that Accp[x] is unbounded if sup(AcCp[x]) 
is equal to oo and bounded otherwise. As ^ccp[x] is also updated by reliable broadcast each time some 
process q modifies Accg [x] we have: 

Lemma 4.2 For all correct processes p and q, for all graplts x in X , sup{AcCp[x]) = sup{AcCq[x]) 

Let sup(Acc[x]) be the supremum sup(AcCp[x]) over all correct process p of Accp[x], then sup(j4cc[2;]) is 
well-defined. If there is a least one x € X such that sup(Acc[x]) is bounded, then min{sup(^cc[x])|x' € 
X} is finite, hence G the graph such that {Acc[G],G) = m.m^^^^{{Acc[x'], x')\x' G X} is well defined. 
Then all correct processes converge to the same graph: 

Lemma 4.3 If there exists x in X such that sup{Acc[x]) is bounded then there is a time after which for 
every correct process p, Gp is G. 

Now prove the Compatibility property. Consider any timeliness graph compatible with T{r), and assume 
that x £ X, then there is a time Iq after which all faulty processes are dead and the estimates of communi- 
cation delays aie greater than the bounds of communication delays of timely links of the run. After time Iq, 
(1) as X contains all coiTcct processes, no process will blame x because it is not a node of x, and (2) as all 
edges of x are timely, no process will blame x for one of its edges then: 

Lemma 4.4 Ifx in X is compatible with T{r), then sup{Acc[x]) is bounded. 

Reciprocally, let x be a timeliness graph of X that is not compatible with the run. If process p is not 
correct there is a time t after which it does not send any alive message, and there is a time after the timers on 
p expire forever for all con^ect processes, then if p is a node of some x G X, Accp [x] is incremented infinitely 
often and sup(Acc[x]) = oo. In the same way if {p, q) is not timely, by the fifo property of the Unk, the 
timer for p expires infinitely often for process q and if {p, q) is an edge of x then AcCg [x] is incremented 
infinitely often and sup(^cc[2;]) = oo. 

Then: 

Lemma 4.5 For every x in X, if sup{Acc[x]) is bounded then x[Correct{r)] is compatible with T{r). 
Hence: 

Lemma 4.6 (Compatibility) G[Correct{r)] is compatible with T(r). 

It remains to prove that G satisfies the Closure property: G[Correct{r)] is a dicut reduction of G or is 
equal to G. As G[Correct{r)] is compatible with T(r), we have: 

Lemma 4.7 Correct{r) C Node{G). 

Let F = Node{G) — Gorrect{r). If F is empty the Closure property is trivially ensured. Consider now the 
case where F is not empty. F contains only faulty processes and {Correct{r), F) is a partition of G{Node). 
If there is an edge in Edge{G) from a faulty process g to a correct process p, eventually the process p never 
receives a message from q and the accusation counter of G will be unbounded, contradicting the choice of 
G. So, we have: 

Lemma 4.8 IfF / then Edge{G) n{Fx Gorrect{r)) = 0. 
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Hence, {Correct{r), F) is a dicut of G. 

Lemma l43] and Lemma l4!4l prove the Convergence property, Lemma l4!6] pro ves the Compatibihty prop- 
erty and Lemma 148] proves the Closure property. Moreover, G is clearly in proving the Validity. Propo- 
sition 13.31 shows that the extraction is exact when all graphs of X are strongly connected. Hence, we can 
conclude with the following theorem: 

Theorem 4.9 Let X be a dicut-closed system. Algorithm A{X) extracts a graph in X. Moreover if all 
graphs of X are strongly connected. Algorithm A{X) exactly extracts a graph in X. 

5 An Efficient Extraction Algorithm 

In this section, we propose another extraction algorithm called AT{X) (Figures |2] and [3l). This algorithm is 
efficient meaning that the (correct) processes eventually only send messages along the edges of the extracted 
graph. 

AJ^{X) (exactly) extracts a timeliness graph from system X, where (1) X is dicut-closed and (2) for all 
graphs g ^ X there is some process p, called root, such that there is a directed path from p to every node of 
g. For example, TTZ££ and IZZMQ systems have this property. 

In the following, we refer to these systems as dicut-closed systems with a root. For every graph g in X, 
the function root{g) returns a root of g. 

In the algorithm, every process p stores several values concerning the graphs x £ X such that root{x) = 
p: (1) ^cc[x] is the accusation counter of x whose goal is the same as in Algorithm [H (2) Prop[x\ is a 
proposition counter whose goal will be explained later, and (3) A[x] gives the expected time for a message 
to go from p (the root of the x) to all the nodes of x. 

Every process also maintains a set variable Gandidates. Each element of this set is a 4-tuple composed 
of a graph x of X and the newest values of Acc[x], Prop[x], and A[a;] known by the process (the exact 
values are maintained at root{x)). Each element in this set is called candidate and each process selects its 
extracted graph among the graphs in the candidate elements. 

As in Algorithm [T] 

(1) Each process p sends alive messages on its outgoing links and monitors its incoming links. However, 
we restrain here the alive message sendings: process p sends alive messages on its outgoing link 
ip, q) only if (p, q) is in a graph candidate. 

(2) A graph candidate is blamed if (a) a correct process is not in the graph or (b) a process receives an out 
of date message through one of its incoming links. In both cases the candidate is definitively removed 
from the Candidates sets of all processes. To achieve this goal the process sends an accusation 
message {AGG) using a reliable broadcast and uses an array Heard that ensures that an identical 
candidate (that is, the same graph with the same accusation and proposition values) can never be 
added again. Moreover, upon delivery of an accusation message for graph x, root[x] increments 
Acc[x\. 

We now present different mechanisms used to obtain the efficiency. 

For all graphs x £ X, only the process root{x) is allowed to propose x as a candidate to the rest. 
Each process p stores its better candidate in its variable me, that is, the least blamed graph x such that 

root{x) = p. 

• If a process finds in Gandidates a better candidate than me, it removes me from Gandidates. 
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• If a process finds that me is better, it adds me to Candidates and sends a new message containing 
me (1) to all processes that are not in Node{me), and (2) to immediate successors of p in me. The 
immediate successors in me add me to their Candidates set and relay the new message, and so on. 
By the reliability of the links, every correct process that is not in me eventually receives this message 
and blames me. 

These mechanisms are achieved by the procedure updateExtractedGraph{). This procedure is called 
each time a graph candidate is blamed or a new candidate is proposed. Note that the Candidates set is 
maintained with the set OtherCand (the candidates of other processes), a boolean Local that is true when 
the process has a candidate, and me, the graph candidate. 

A process p may give up a candidate without this candidate being blamed: in this case, p is the root of the 
candidate, it finds a better candidate in OtherCand, and removes me from Candidates. Then, p must not 
increment Acc[me] when it receives accusations caused by this removing, indeed these accusations are not 
due to delayed messages. That is the goal of the proposition counter (Prop): in Prop[x], root{x) counts the 
number of times it proposes x as candidate and includes this value in each of its new messages (to inform 
other process of the current value of the counter). Hence, when q wants to blame x, it now includes its own 
view of Prop[x\ in the accusation message. This accusation will be considered as legitimate by root[x] 
(that is, will cause an increment of ^cc[x]) only when the proposition counter inside the message matches 
Prop[x\. Also, whenever root[x] removes x from Candidates, root[x\ increments Prop[x\ and does not 
send the new value to the other processes. In this way accusations due to this removing will be ignored. 

For any timely candidate, the accusation counter will be bounded and its proposition counter increased 
each time it is proposed. In this way the graph with the smallest accusation and proposition values eventually 
remains forever in the Candidates set of all correct processes and it is chosen as extracted graph. (This is 
done in the procedure updateExtractedCraph{).) Moreover, eventually all other candidates are given up 
and it remains only this graph in Candidates. In this way, only alive messages are sent and they are sent 
along the directed edges of the extracted graph ensuring the efficiency. 

Code for each process p 
1 : Procedure updateExtractedGraph{) 



2: Let (ami„, mm) = min_;j^^ {(acc, c) such that (c, acc, —, —) G OtherC and} VJ {{oo , oo)} 

3: it {a^in,min) < {Acc[me\,me) /\ Local tbta /*Giveupme*/ 

4: rbroadcast {ACC,me,Acc[me],Prop[me],A[me]) 

5: Prop[me] <— Prop[me] + 1 

6: Local •<— false 

7: Candidates <— OtherCand 

8: me <— x such that (a, x) = min^j^^ {(acc, c) such that c G A" A root{c) = p} 

9: if [Acclme], me) < {ajjii„,min) A Local = false then /* Propose me */ 
10: Local <— true 

11: Candidates <— Candidates U {{me, Acc[me], Prop[me], A[me])} 

12: send{new,me,Acc[me],Prop[me],A[me]) to every process not in Nodeime) 

13: for all h, G n \ {p} do 

14: if (/i,p)G Edge[me) then 

15: A[/i]-i-max(A[/i], A[me]) 

16: settimer(/i) <- A[/i] 

17: if (p,/i)G Edge{me) and h ^ root{me) then 

18: send{new,me, Acc[me], Prop[me], A[me]) to h 

19: G -ir- X such that (a, x) min_<j_,^ {(a', x') such that (x', a' ,p' , d') G Candidates} 



Figure 2: Procedure updateExtractedGraph of Algorithm AT{X) 

A sketch of the correctness proof of AT{X) is given in the appendix. Then, we can conclude with the 
following theorem: 

Theorem 5.1 Let X be a dicut-closed system with a root. Algorithm A{X) efficiently extracts a graph in 
X. Moreover if all graphs of X are strongly connected, Algorithm A{X) efficiently and exactly extracts a 
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de for each process p 
On initialization: 

for aUx £ X sucli tliat root(x) = p do 

Acc[a;] <— 0; Prop[x] ^ 0; A[x] <— n 
for allx £ X sucli tliat root{x) ^ p do Heard[x] ^ (—1, —1) 
for all g G n \ {p} do A[g] •<- 1 
OtherCand 
Local <— false 

me •<— mm{x such that a; G A* A root(a;) = p} 
updateExtractedGraph ( ) 
start tasks 1 and 2 

task 1 : 

loop forever 

send(alive) to every process q such that 3(x,-,-,-)G Candidates and (p, g) G Edgeix) every time 

task 2: 

upon receive(a/TOe) from q do 
settimer (g) <— A[g] 

upon timerexpire(q) do /* Link (g, p) is not timely, blame all candidates that contains (g,p) */ 
for all {x, a, pr, d) G OtherCand such that {q, p) G Edgeix) do 

rbroadcast (j4CC,a;,a,pr,d) 
if (g,p) G Edge{me) then 

rbroadcast (j4CC, me, j4cc[me],Prop[m.e], A [me]} 

upon receive(ne'u;, x, a, pr, d) from q do /* Proposition of a new candidate */ 
if p ^ Node{x) then / * Blame x that does not contain p */ 

rbroadcast(ACC,a:,a,pr) 

else 

newCand <— false 

if (x, —, —, —) ^ OtherCand mid Heard{x) < (a,pr)then /* New candidate */ 
newCand <— true 

ii3{x,ac,prc,dc) G OtherCand with {ac,prc) < {a,pr) then /* New candidate */ 
OtherCand OtherCand \ (c, ac,prc,dc) 
newCand <— true 
if newCand then 

OtherCand <— OtherCand U (a;, a, pr, d) 
«pdaiei?xtractedGrap/!,() 
Heard[x] <— (a,pr) 
for aU h G n \ {p} do 

if ('i,p)G Edge{x) then 
A[h]<- max(A[h],d) 
settimer(/i)<— A[h] 
if (p,h)£ Edge{x) and ^ root(a;) then send(neto, x, a, pr, d) to h 

upon rdeliver(ACCa;, a,pr,d) do 
if root{x) = p then 

ifx = me A a = Acc[me] Apr = Prop[me] then / * Check if the accusation is up to date * / 
Acc[me] Acc[me] + 1; A[me] <— A[me] + 1 
Local <— false 

else 

OtherCand •<— OtherCand \ {x, a,pr, d) 
if Heard[x] < (a, pr) then Heard[x] <— (a, pr) 
updateExtractedCraphQ 



Figure 3: Algorithm AT{X) that efficiently extracts a graph in X 



graph in X. 



6 Conclusion 

Failure detector implementations in partially synchronous models generally use the timeliness properties of 
the system to approximate the set of correct (or faulty) processes. In some way, the extraction problem is 
a kind of generahzation: instead of only searching the set of correct processes, here we try to extract also 
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information about the timeliness of links. Besides, our solutions are based on already existing mechanisms 
used in failure detectors implementations as in ElO. 

Information about the timeliness of links is useful for efficienecy of fault-tolerant algorithms. In partic- 
ular, in any extracted graph, any path between a pair of con^ect processes is only constituted of timely links. 
This property is particulary interesting to get efficient routing algorithms. 

We gave an extraction algorithm for dicut-closed set of timeliness graphs. Moreover, we proved that the 
extraction is exact when all the timeliness graphs are also strongly connected. 

Given dicut-closed timeliness graphs that contains a root, we shown how to efficiently extract a graph 
from it. By efficiency we mean giving a solution where eventually messages are only sent over the links of 
the extracted graph. 

It is important to note that the main purpose of the algorithms we proposed is to show the feasability of 
the extraction under some conditions. So, the complexity of our algorithms was not the main focus of this 
paper. 

As a consequence, our algorithms are somehow unrealistic because of their high complexity. Giving 
more practical solutions will be the purpose of our future works. 
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A Appendix 



A.l Proof of Proposition 13.41 

Proposition l3.4l There exists some systems X for which there is no extraction algorithm. 
Sketch of Proof. 

Assume there is an extraction algorithm A for VAXTZ with 5 processes. 

Consider a run r of^ in system PATTe with T(r) = ({pi,P2,P3,P4,P5}, {(pi,P2), (p2,Pi), (P3,P4), (P4,P3)})- 
To satisfy the properties of the extraction, ({pi,p2,P3,P4,P5}, {(pi,P2), {p2,Pi)}) or {{pi,P2,P3,P4,P5}, {{P3,P4), 
(p-iiPa)}) must be extracted from the run r. There is a time ti after which r converges for example to {{pi,P2,P3,P4,P5}, 

{{Pl,P2), {p2,Pl)})- 

Consider now run r' of A in system VAXTZ with T(r') = {{p3,P4,P5}, {(^3,^4), (^4,^3)}) such that r and r' 
are indistinguishable until time ti and pi and p2 crash in r' at time ti + 1. There is a time ^2 after which r' converges 
to a graph with the directed edges {(ps, P4), (p4, ^3)}- 

Consider now that in r all messages from pi and p2 to {^3,^4,^5} sent after time ti are delayed after time 
t2- For p5, the runs r and are indistinguishable until t2- So, at time t2, P5 outputs a graph with directed edges 

{{P3,P4), {P4,P3)}- 

Now consider run r" of A in system VAXTZ with T{r") = ({pi,P2,P5}, {(pi,P2), (p2,Pi)}) such that r and r" 
are indistinguishable until time t2 and and p4 crash in r" at time t2 + l- There is a time ^3 after which r" converges 
to a graph with the directed edges {(^1,^2), (P2, Pi)}- 

Consider again that in the run r all messages from p^ and p4 to {pi,p2,P5} sent after time t2 are delayed af- 
ter i3. For p5 the runs r and r" are indistinguishable. So, at time ta, p^ outputs a graph with directed edges 

{bi,P2), b2,pi)}- 

Inductively, we can construct the run r in such a way that alternates forever between a graph with directed 
edges {(^1,^2)7 (P2,Pi)} and a graph with directed edges {(133,^4), (P4,P3)} and never converges definitively. This 
contradicts the existence of an algorithm that extracts a graph in VAXTZ. □ 

A.l Proof of Proposition 13.51 

Proposition l3.5l r/iere exists some systems X for which there is an extraction algorithm and there is no exact extraction 
algorithm. 

Sketch of Proof. Consider the system TTZ££ with 3 processes. We prove in the next section that there is an extraction 
algorithm for this system. Assume there is an exact extraction algorithm A for this system. 

Consider a run r of ^ in this system with T(r) = ({pi,P2,P3}, {(pi,P2), (pi,P3)})- To satisfy the properties of 
the exact extraction, there is a time ti after which the graph ({pi,P2,P3}, {(pi,P2), {PiiP3)}) is extracted. 

Consider now run r' of A in system TTZ££ with T{r') = {{pi,P2}, {{pi,P2)}) such that r and r' are in- 
distinguishable until time ti and p^ crashes in r' at time ti + 1. There is a time <2 after which r' converges to 

{{PliP2},{iPl,P2)}) ■ 

Consider now that in r all messages from p^ to {pi,P2} sent after time ti are delayed after time t2. For pi, the 
run r and r' are indistinguishable until ^2- So, at time t2, pi outputs {{pi,P2}, {(pi,P2)})- 

Inductively, we can construct the run r in such a way that pi alternates forever between a graph {{pi,P2,P3}, 
{ (pi , P2 ) 7 {pitPs)}) and a graph ( {pi , P2 } , { (pi , P2 ) } ) and never converges definitively. This contradicts the existence 
of an algorithm that exactly extracts a graph in TTZ££. □ 

A.3 Proof of Theorem O 

In this section, we propose a sketch of the correctness proof of the efficient extraction algorithm AJ-{X) (Figures |2] 
and|3]l. In this sketch, we consider a run r of AJ-{X) in dicut-closed system with a root, X. We will denote by war* 
the value of varp at time t. 

We first notice that all variables Accfa;] and Prop[x] can only be modified by the process root{x) and are increas- 
ing; 
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Lemma A.l For all time t and t', t > t' , for all processes p, for all graphs x in X such that p = root{x), AcCp [x] > 
AcCp [x] and Prop p[x] > Propp [x]. 

Consider a graph x such that its root p crashes. Eventually, every process q such that x <E OtherCand and 
{p, q) G Edge{x) reliably broadcasts an accusation for x. This way, x is removed from the OtherCand set of any 
correct process and never more added (because p is crashed), hence: 

Lemma A.l If p is faulty, there exists a time t such that for all graphs x of X with root{x) ~ p, for all correct 
processes q in r, for all t' > t: x ^ OtherCand*'^ . 

As r is a run of X, there exists some timeliness graph o in A:" such that o is compatible with T[r]. In this case, 
Nodes{o) — Correct{r) and the process root{o) is a correct process: 

Lemma A.3 There exists a timeliness graph o of X such that a is compatible with T(r) and root{o) is a correct 
process. 

Moreover: 

Lemma A.4 Let o be a timeliness graph of X such that o[Correct{r)\ is a compatible with T{r) and root{o) is a 
correct process: AcCroot(o)[o\ is bounded. 

For all correct processes p, for all graphs x in X with root{x) = p, let A[x]p be the largest value of Acc[.T]p in r 
(oo if ^cc[x]p is unbounded). Let g to be the graph with the smallest A[g]p (break ties by the total order on graphs). 
Let C be the value of v4[g]p. 

Note that from Lemma lAjj and Lemma lA~4l C < oo. Moreover, by construction of g, root{g) is a correct process, 
root{g) eventually elects g forever {meroot(g) = 5)- and as a consequence Prop[g]root{g) becomes constant: 

Lemma A.5 There exists a time after which meroot{g) ~ 9- 

Lemma A.6 There exists a time after which Prop[g]root(g) stops changing. 

Let P be the largest value of the proposition counter of g {Prop[g]). The following three lemmas are immediate 
consequences of Lemma IA31 

Lemma A.7 For every correct process p ^ root(g), there exists a time after which g £ OtherCandp. 

Lemma A.8 There exists a time after which mej.ggf.(^g-^ = g and Local^ootig) ~ true and OtherC and^^^i^^g-^ ~ 0. 

Lemma A.9 For every correct process p ^ root{g), there exists a time after which OtherCandp = {g} and 
Localp ~ false. 

From Lemmas lA.SI and lA. 91 the algorithm converges to a graph of X: 

Lemma A.IO There exists a timeliness graph x d X (actually g) such that every correct process q outputs x forever. 

From Lemma lATSl and Lemma [A9l we can deduce that the algorithm is efficient: 

Lemma A.ll There is a time after which every correct process p sends messages only to the process q such that there 
is a directed edge [p, q) in Edge{g). 

From the Lemma lA.lOl we deduce the Convergence and the Validity properties. 

It remains to prove that g satisfies the properties of the approximation: (1) g[Correct{r)] is compatible with T[r], 
and (2) g[Correct{r)] is a dicut reduction of g or is equal to g. 

When root{g) sets Local to true and me to (y, C, P, — ), it sends a message new to all processes (recall that C the 
final value of the accusation counter of g and P the final value of its the proposition counter). As the links are reliable, 
all correct processes eventually receives this message. If a correct process q is not in Node{g), it reliably broadcasts 
an accusation message ACC. When process root{g) delivers such a broadcast, it increments the accusation counter 
of g contradicting the fact that ^cc[(?] is bounded by C, hence: 
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Lemma A.12 Correct{r) C Node{g). 

When a correct process receives this new message, it sends {alive) to every process q such that {p, q) in Edge{g). 
And it monitors all incoming links {q, p) such that {q, p) in Edge{g). If there is a link (a, h) of Edge{g) between two 
correct processes a and 6, then a sends regularly alive message to h. By construction of g, b never blames g, then b 
receives no out of date message. By the FIFO property of the link, the hnk is timely: 

Lemma A.13 g[Correct(r)] is compatible with T[r]. 

By Lemma rA.121 Node{g) = Correct{r) U F. 

If F is empty the Closure property is trivially ensured. We now consider the case where F is not empty. F 
contains only faulty processes. If there is an edge in Edge{g) from a faulty process g to a correct process p, eventually 
the process p stops receiving messages from q and the accusation counter of g will be incremented, which contradicts 
the fact that the accusation counter of g remains equal to C forever So we have: 

Lemma A.14 IfF ^ then Edge{g) n {F x Correct{r)) = 0. 

We showed the Convergence (Lemma [A. 101 ). the Validity (Lemma lA.lOl ). the Compatibility (Lemma [A. 131 ). the 
closure (Lemma fA.HI i. and the Efficiency (Lemma FA.llI ). Moreover, Proposition l3.3l shows the exact extraction when 
all graphs of X are strongly connected. Hence, we can conclude with the following theorem: 

Theorem lS.ll Lef X be a dicut-closed system with a root. Algorithm A{X) efficiently extracts a graph in X. Moreover 
if all graphs of X are strongly connected. Algorithm A{X) efficiently and exactly extracts a graph in X. 
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